How Programs Execute: CPU, RAM & Memory Management

Overview

When you run a program, an intricate dance occurs between the CPU, RAM, and storage. This document explains exactly how your computer transforms code into actions.

Computer Architecture Fundamentals

CPU Architecture

Inside the CPU

Key CPU Components

Component	Purpose	Speed
Registers	Store immediate data for processing	1 cycle (~0.3 ns)
L1 Cache	First-level cache, fastest memory	3-4 cycles (~1 ns)
L2 Cache	Second-level cache	10-20 cycles (~3 ns)
L3 Cache	Third-level cache, shared	30-70 cycles (~10 ns)
RAM	Main system memory	100-300 cycles (~100 ns)
SSD	Solid state storage	~50,000 ns
HDD	Hard disk drive	~5,000,000 ns

Program Execution Flow

From Storage to Execution

Memory Hierarchy

CPU Instruction Cycle (Fetch-Decode-Execute)

Example: Adding Two Numbers

Instruction: ADD R1, R2, R3  (R1 = R2 + R3)

1. FETCH:
   - PC = 0x1000 (program counter points to instruction)
   - Load instruction from memory address 0x1000
   - PC = 0x1004 (move to next instruction)

2. DECODE:
   - Opcode: ADD
   - Operand 1: R2 (register 2)
   - Operand 2: R3 (register 3)
   - Destination: R1 (register 1)

3. EXECUTE:
   - Read value from R2 (e.g., 10)
   - Read value from R3 (e.g., 20)
   - ALU performs: 10 + 20 = 30

4. STORE:
   - Write result (30) to R1
   - Update flags (zero flag, carry flag, etc.)

RAM Organization for a Program

Memory Layout of a Process

Memory Segments Explained

1. Text/Code Segment

Contains compiled machine code (instructions)
Read-only and executable
Shared among multiple instances of same program
Fixed size at load time

2. Data Segment

Initialized Data: Global and static variables with initial values

int globalVar = 100;  // Stored in data segment
static int count = 0; // Stored in data segment

3. BSS Segment (Block Started by Symbol)

Uninitialized global and static variables
Automatically initialized to zero

Doesn't occupy space in executable file

int globalArray[1000]; // Stored in BSS
static int flag;       // Stored in BSS

4. Heap

Dynamic memory allocation
Grows upward toward higher addresses
Managed by programmer (malloc/free, new/delete)

Exists until program ends or explicitly freed

int* ptr = malloc(sizeof(int) * 100); // Allocated on heap

5. Stack

Automatic memory allocation
Grows downward toward lower addresses
Stores local variables, function parameters, return addresses

Automatically cleaned up when function returns

void function() {
    int localVar = 10; // Stored on stack
}

Variable Storage in Memory

Example Program Analysis

#include <stdio.h>
#include <stdlib.h>

int globalVar = 100;           // Data segment
static int staticVar = 200;    // Data segment
int uninitGlobal;              // BSS segment

void function(int param) {      // param on stack
    int localVar = 10;          // Stack
    static int staticLocal = 5; // Data segment
    int* heapVar = malloc(sizeof(int)); // Pointer on stack, data on heap
    *heapVar = 20;              // Value stored on heap

    printf("Address of param: %p\n", &param);
    printf("Address of localVar: %p\n", &localVar);
    printf("Address of heapVar: %p\n", heapVar);

    free(heapVar);
}

int main() {
    int mainLocal = 5;          // Stack
    function(mainLocal);
    return 0;
}

How CPU Executes Instructions

Assembly to Machine Code

CPU Registers During Execution

Complete Program Execution Example

Simple C Program

int main() {
    int a = 5;
    int b = 10;
    int c = a + b;
    return c;
}

Step-by-Step Execution

Memory Access Pattern

Function Call Stack

How Function Calls Work

Function Call Example

void function2(int x) {
    int local2 = x * 2;
    return;
}

void function1(int y) {
    int local1 = y + 1;
    function2(local1);
    return;
}

int main() {
    int a = 5;
    function1(a);
    return 0;
}

Dynamic Memory Allocation

Heap Management

malloc/free Process

CPU Pipeline

Modern CPUs Execute Multiple Instructions Simultaneously

Pipeline Stages

Fetch (IF): Get instruction from memory
Decode (ID): Interpret instruction and read registers
Execute (EX): Perform operation in ALU
Memory (MEM): Access memory if needed
Writeback (WB): Write result to register

Cache Memory

How Cache Works

Cache Line Example

Virtual Memory

Virtual to Physical Address Translation

Page Table Structure

Complete System View

Performance Comparison

Access Time Comparison

Human-Scale Time Analogy

If accessing a CPU register took 1 second, here's how long other operations would take:

Memory Level	Actual Time	If Register = 1 Second
CPU Register	0.3 ns	1 second
L1 Cache	1 ns	3 seconds
L2 Cache	3 ns	10 seconds
L3 Cache	10 ns	33 seconds
RAM	100 ns	5.5 minutes
SSD	50 μs	1.9 days
HDD	5 ms	6.4 months

Key Takeaways

Speed vs Size Trade-off: Faster memory is exponentially more expensive and smaller
Locality Matters: Programs that access nearby memory locations run faster due to caching
Cache is Critical: Modern CPUs spend significant silicon area on cache to bridge the speed gap
RAM is Slow: Despite being "fast" by human standards, RAM is ~100x slower than L1 cache
Disk is Extremely Slow: SSDs are 500,000x slower than registers; HDDs are 16 million times slower

Overview​

Computer Architecture Fundamentals​

CPU Architecture​

Inside the CPU​

Key CPU Components​

Program Execution Flow​

From Storage to Execution​

Memory Hierarchy​

CPU Instruction Cycle (Fetch-Decode-Execute)​

Example: Adding Two Numbers​

RAM Organization for a Program​

Memory Layout of a Process​

Memory Segments Explained​

1. Text/Code Segment​

2. Data Segment​

3. BSS Segment (Block Started by Symbol)​

4. Heap​

5. Stack​

Variable Storage in Memory​

Example Program Analysis​

How CPU Executes Instructions​

Assembly to Machine Code​

CPU Registers During Execution​

Complete Program Execution Example​

Simple C Program​

Step-by-Step Execution​

Memory Access Pattern​

Function Call Stack​

How Function Calls Work​

Function Call Example​

Dynamic Memory Allocation​

Heap Management​

malloc/free Process​

CPU Pipeline​

Modern CPUs Execute Multiple Instructions Simultaneously​

Pipeline Stages​

Cache Memory​

How Cache Works​

Cache Line Example​

Virtual Memory​

Virtual to Physical Address Translation​

Page Table Structure​

Complete System View​

Performance Comparison​

Access Time Comparison​

Human-Scale Time Analogy​

Key Takeaways​